[Codex] Add handling for Conversational RAG to Validator API#84
[Codex] Add handling for Conversational RAG to Validator API#84
Conversation
| assert all(scores[k]["score"] == raw_scores[k]["score"] for k in raw_scores) | ||
|
|
||
|
|
||
| def test_prompt_tlm_with_message_history() -> None: |
There was a problem hiding this comment.
add test to confirm there is no query rewriting happening, whenever this is the first user message
There was a problem hiding this comment.
add test to confirm that the primary TrustworthyRAG.score(prompt, response) call happens with prompt reflecting the full chat history, not with prompt reflecting the rewritten query.
Confirm you are using this TLM utils method:
cleanlab/cleanlab-tlm@a479e32
to turn the chat history into a prompt string.
| return "other_issues" | ||
|
|
||
|
|
||
| def validate_messages(messages: Optional[list[dict[str, Any]]] = None) -> None: |
There was a problem hiding this comment.
I think this name validate_messages should be more carefully chosen when the entire validator module reserves the name method validate in Validator for looking at the trustworthiness & Eval scores.
I'd bet we wouldn't change the Validator.validate api, but we could find a different name for validate_messages since it behaves quite differently.
There was a problem hiding this comment.
Consider having validate_messages take messages as a required (positional argument):
| def validate_messages(messages: Optional[list[dict[str, Any]]] = None) -> None: | |
| def validate_messages(messages: list[dict[str, Any]]) -> None: |
Everywhere it's being called, it takes in a messages argument.
The caller already sets a default value for that argument, so I'd advise against setting default values in two function signatures.
| codex_answer, _ = self._project.query(question=query, metadata=metadata) | ||
| return codex_answer | ||
|
|
||
| def _maybe_rewrite_query(self, *, query: str, messages: list[dict[str, Any]]) -> str: |
There was a problem hiding this comment.
This _maybe... prefix implies that we might get something different from the method, other than a string. Should the check for self._tlm be done by the caller?
There was a problem hiding this comment.
the maybe is supposed to suggest we might re-write the query or not
|
closed because conversational capability moved to the backend |
No description provided.